将深度学习结果与标准3D重建管道结合结果的最佳方法仍然是一个开放的问题。虽然当前将传统多视角立体声输出输出到正规化或改进的网络的系统似乎可以获得最佳结果,但最好将深层神经网络视为单独的组件,其结果可以将其概率地融合到基于几何形状的结果中系统。不幸的是,进行此类融合所需的错误模型尚不清楚,并提出了许多不同的方法。最近,一些系统通过使他们的网络预测概率分布而不是单个值来实现良好的结果。我们建议使用这种方法将学习的单视深度融合到标准的3D重建系统中。我们的系统能够为一组密钥帧逐步生成密集的深度图。我们训练一个深神网络,以预测单个图像中每个像素深度的离散,非参数概率分布。然后,我们根据后续帧和密钥帧图像之间的光度一致性将此“概率卷”与另一个概率卷融合在一起。我们认为,将这两个来源的概率量结合在一起将导致一个更好的条件。为了从体积中提取深度图,我们最大程度地减少了一个成本函数,该成本函数包括基于网络预测的表面正常和遮挡边界的正则化项。通过一系列实验,我们证明了这些组件中的每一个都改善了系统的整体性能。
translated by 谷歌翻译
虽然稀疏单眼同时定位和映射(SLAM)系统创建的基于按键的地图对于相机跟踪很有用,但对于许多机器人任务,可能需要密集的3D重建。涉及深度摄像机的解决方案在范围内和室内空间受到限制,并且基于最小化帧之间的光度误差的密集重建系统通常受到限制很差,并且遭受了规模歧义。为了解决这些问题,我们提出了一个3D重建系统,该系统利用卷积神经网络(CNN)的输出来生成包括度量标准量表的密钥帧的完全密集的深度图。我们的系统DeepFusion能够在GPU上产生实时密集的重建。它使用网络产生的学习不确定性,以概率方式将半密度的多视频立体算法与CNN的深度和梯度预测融合在一起。虽然网络只需要每个键帧一次,但我们能够使用每个新帧对深度图进行优化,以便不断利用新的几何约束。根据其在合成和现实世界数据集上的性能,我们证明了DeepLusion至少能够和其他可比较的系统执行。
translated by 谷歌翻译
尽管密集的视觉大满贯方法能够估计环境的密集重建,但它们的跟踪步骤缺乏稳健性,尤其是当优化初始化较差时。稀疏的视觉大满贯系统通过将惯性测量包括在紧密耦合的融合中,达到了高度的准确性和鲁棒性。受这一表演的启发,我们提出了第一个紧密耦合的密集RGB-D惯性大满贯系统。我们的系统在GPU上运行时具有实时功能。它共同优化了相机姿势,速度,IMU偏见和重力方向,同时建立了全球一致,完全密集的基于表面的3D重建环境。通过一系列关于合成和现实世界数据集的实验,我们表明我们密集的视觉惯性大满贯系统对于低纹理和低几何变化的快速运动和时期比仅相关的RGB-D仅相关的SLAM系统更强大。
translated by 谷歌翻译
由于其许多潜在应用,从视频中估算人类运动是一个活跃的研究领域。大多数最先进的方法可以预测单个图像的人类形状和姿势估计,并且不利用视频中可用的时间信息。许多“野生”运动序列被移动的摄像机捕获,这为估计增加了混合的摄像头和人类运动的并发症。因此,我们介绍了Bodyslam,这是一种单眼大满贯系统,共同估计人体的位置,形状和姿势以及摄像机轨迹。我们还引入了一种新型的人类运动模型,以限制顺序身体姿势并观察场景的规模。通过通过移动的单眼相机捕获的人类运动的视频序列进行的一系列实验,我们证明了Bodyslam与单独估计这些估计相比,可以改善所有人体参数和相机的估计。
translated by 谷歌翻译
使用3D神经字段的几何形状,颜色和语义的关节表示使得能够使用手持式RGB-D传感器实时地重建场景的超稀疏交互来精确密集标记。我们的ILABEL系统不需要培训数据,但可以比在大型培训的图像数据集上培训的标准方法更准确地标记场景。此外,它以“开放式”方式工作,使用用户在飞行中定义语义类。 Ilabel的潜在模型是一款从头开始培训的多层的感知者(MLP),以实时地学习联合神经场景表示。场景模型是实时更新和可视化的,允许用户对焦相互作用以实现高效标记。可以将房间或类似的场景精确标记为10+语义类别,只需几十点击即可。定量标签精度使用点击次数强烈缩放,并迅速超越标准的预培训语义分段方法。我们还展示了一个分层标签变体。
translated by 谷歌翻译
Relation extraction (RE), which has relied on structurally annotated corpora for model training, has been particularly challenging in low-resource scenarios and domains. Recent literature has tackled low-resource RE by self-supervised learning, where the solution involves pretraining the relation embedding by RE-based objective and finetuning on labeled data by classification-based objective. However, a critical challenge to this approach is the gap in objectives, which prevents the RE model from fully utilizing the knowledge in pretrained representations. In this paper, we aim at bridging the gap and propose to pretrain and finetune the RE model using consistent objectives of contrastive learning. Since in this kind of representation learning paradigm, one relation may easily form multiple clusters in the representation space, we further propose a multi-center contrastive loss that allows one relation to form multiple clusters to better align with pretraining. Experiments on two document-level RE datasets, BioRED and Re-DocRED, demonstrate the effectiveness of our method. Particularly, when using 1% end-task training data, our method outperforms PLM-based RE classifier by 10.5% and 5.8% on the two datasets, respectively.
translated by 谷歌翻译
As language models (LMs) scale, they develop many novel behaviors, good and bad, exacerbating the need to evaluate how they behave. Prior work creates evaluations with crowdwork (which is time-consuming and expensive) or existing data sources (which are not always available). Here, we automatically generate evaluations with LMs. We explore approaches with varying amounts of human effort, from instructing LMs to write yes/no questions to making complex Winogender schemas with multiple stages of LM-based generation and filtering. Crowdworkers rate the examples as highly relevant and agree with 90-100% of labels, sometimes more so than corresponding human-written datasets. We generate 154 datasets and discover new cases of inverse scaling where LMs get worse with size. Larger LMs repeat back a dialog user's preferred answer ("sycophancy") and express greater desire to pursue concerning goals like resource acquisition and goal preservation. We also find some of the first examples of inverse scaling in RL from Human Feedback (RLHF), where more RLHF makes LMs worse. For example, RLHF makes LMs express stronger political views (on gun rights and immigration) and a greater desire to avoid shut down. Overall, LM-written evaluations are high-quality and let us quickly discover many novel LM behaviors.
translated by 谷歌翻译
Recent methods in self-supervised learning have demonstrated that masking-based pretext tasks extend beyond NLP, serving as useful pretraining objectives in computer vision. However, existing approaches apply random or ad hoc masking strategies that limit the difficulty of the reconstruction task and, consequently, the strength of the learnt representations. We improve upon current state-of-the-art work in learning adversarial masks by proposing a new framework that generates masks in a sequential fashion with different constraints on the adversary. This leads to improvements in performance on various downstream tasks, such as classification on ImageNet100, STL10, and CIFAR10/100 and segmentation on Pascal VOC. Our results further demonstrate the promising capabilities of masking-based approaches for SSL in computer vision.
translated by 谷歌翻译
As AI systems become more capable, we would like to enlist their help to supervise other AIs. We experiment with methods for training a harmless AI assistant through self-improvement, without any human labels identifying harmful outputs. The only human oversight is provided through a list of rules or principles, and so we refer to the method as 'Constitutional AI'. The process involves both a supervised learning and a reinforcement learning phase. In the supervised phase we sample from an initial model, then generate self-critiques and revisions, and then finetune the original model on revised responses. In the RL phase, we sample from the finetuned model, use a model to evaluate which of the two samples is better, and then train a preference model from this dataset of AI preferences. We then train with RL using the preference model as the reward signal, i.e. we use 'RL from AI Feedback' (RLAIF). As a result we are able to train a harmless but non-evasive AI assistant that engages with harmful queries by explaining its objections to them. Both the SL and RL methods can leverage chain-of-thought style reasoning to improve the human-judged performance and transparency of AI decision making. These methods make it possible to control AI behavior more precisely and with far fewer human labels.
translated by 谷歌翻译
Convolutional neural networks (CNNs) are currently among the most widely-used neural networks available and achieve state-of-the-art performance for many problems. While originally applied to computer vision tasks, CNNs work well with any data with a spatial relationship, besides images, and have been applied to different fields. However, recent works have highlighted how CNNs, like other deep learning models, are sensitive to noise injection which can jeopardise their performance. This paper quantifies the numerical uncertainty of the floating point arithmetic inaccuracies of the inference stage of DeepGOPlus, a CNN that predicts protein function, in order to determine its numerical stability. In addition, this paper investigates the possibility to use reduced-precision floating point formats for DeepGOPlus inference to reduce memory consumption and latency. This is achieved with Monte Carlo Arithmetic, a technique that experimentally quantifies floating point operation errors and VPREC, a tool that emulates results with customizable floating point precision formats. Focus is placed on the inference stage as it is the main deliverable of the DeepGOPlus model that will be used across environments and therefore most likely be subjected to the most amount of noise. Furthermore, studies have shown that the inference stage is the part of the model which is most disposed to being scaled down in terms of reduced precision. All in all, it has been found that the numerical uncertainty of the DeepGOPlus CNN is very low at its current numerical precision format, but the model cannot currently be reduced to a lower precision that might render it more lightweight.
translated by 谷歌翻译